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Introduction 


The  problem  of  determining  discontinuities  in  displacement  vector  fields  arises  in 
both  Binocular  Stereo  and  Motion  analysis.  It  has  received  much  attention  in 
computer  vision  research.  There  are  two  ways  of  viewing  this  problem.  It  can  be 
seen  as  an  “edge”  detection  problem  in  a  vector  field,  or  it  can  be  seen  as  a  problem 
of  finding  regions  with  common  properties.  Both  views  are  complementary  to  each 
other.  The  latter  view  relates  the  question  of  discontinuities  in  vector  fields  to 
the  old  problem  of  segmentation,  with  the  important  difference  that  regions  are 
not  defined  by  the  the  image  property  “gray  value”,  but  by  the  object  property 
“displacement”.  In  the  field  of  displacement  vector  field  estimation  there  are, 
however,  few  attempts  to  relate  this  to  segmentation  approaches.  The  domain  of 
segmentation  techniques  have  been  traditionally  and  still  are  gray  value  related 
features,  but  hardly  vector  fields  as  they  appear  in  Stereo  or  motion. 

The  reason  for  this  is  the  fact  that  local  displacement  vector  field  (DVF)  esti¬ 
mates  have  large  variations  in  their  variance,  which  is  not  only  due  to  rmi-se  effects, 
but  is  inherent  to  the  displacement  estimation  itself,  depending  on  the  image  geom¬ 
etry.  In  “normal”  segmentation  tasks  the  variance  of  a  modelled  regional  property 
is  mainly  determined  by  noise,  and  therefore  the  task  can  in  principle  be  solved 
with  an  adequate  noise  model.  An  excellent  example  of  this  is  described  in  [Lec89]. 

A  number  of  approaches  have  been  applied  to  solve  the  problem  of  finding 
displacement  information  and  the  discontinuities  of  it.  It  seems  reasonable  to 
divide  them  into  three  different  strategies. 

1.1  Current  Approaches 

1.1.1  Discontinuities  from  Displacement  Vector  Field 

The  first  approach  separates  the  problems  of  the  determination  of  the  displace¬ 
ments  and  the  detection  of  discontinuities.  In  the  first  step  it  is  assumed  that 
there  are  no  discontinuities,  and  a  fully  regularized  solution  is  determined.  Exam¬ 
ples  of  this  strategy  are  the  “Optical  Flow”  algorithm  [HS81],  surface  interpolation 
applying  the  model  of  the  thin  plate  [Gri81,  Ter83],  or  the  membrane  model  in 
matching  [Bro8l],  and  a  series  of  motion  estimation  schemes  [Nag83,  Hil84,  Den86, 
NE86,  Ana87,  TWK87].  The  detection  of  discontinuities  follows  as  a  second  step 
on  the  basis  of  “tensions”  in  the  determined  vector  field.  These  can  in  principle  be 
found  by  significant  gradients  [DS88]  or  by  the  zero-crossings  of  the  second  deriva¬ 
tive  of  the  flow  field  [TMB85].  It  must  be  stressed  that  up  to  now  only  this  type 
of  approach  for  finding  a  dense  displacement  vector  field  has  a  time  complexity 
that  allows  a  real-time  implementation  when  an  appropriate  control  strategy  in 
combination  with  an  efficient  multi-grid  solver  is  used  [SD89]. 

It  has  been  shown  that  the  optical  flow  allows  qualitative  statements  about 
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objects’  motion,  but  that  the  quantitative  optical  flow  is  not  a  reliable  measure  of 
the  real  motion  [VP87,  SU87].  This  applies  even  more  to  the  derivatives  of  the  flow 
field.  It  can  be  expected,  and  indeed  it  can  be  demonstrated,  that  discontinuities 
determined  in  this  manner  deviate  considerably  from  the  actual  ones  in  real-world 
sequences. 

Another  approach  was  the  segmentation  of  the  optical  flow  field  with  a  model 
based  Hough  transform,  using  the  constraints  of  3D  rigid  body  motion  [Adi85] . 

1.1.2  Simultaneous  estimation  of  Discontinuites  and  Vector  Field 

An  appealing  approach  from  a  theoretical  point  of  view  is  to  minimize  a  single 
functional,  where  different  processes  are  combined.  Typical  elements  of  this  func¬ 
tional  are  a  similarity  function  between  potentially  corresponding  image  patches, 
the  regularization  conditions,  and  a  line  or  edge  process,  that  limits  the  regular¬ 
ization  and  therefore  allows  “continous  disco"’.. mutes”.  Following  this  approach 
the  determination  of  the  displacements  and  the  detection  of  discontinuities  are 
interwoven  [KMY86,  WW85]. 

The  first  problem  with  this  approach  is  that  the  constructed  functional  is  no 
longer  convex,  it  has  local  minima  that  can  be  far  away  from  the  global  minimum. 
(Strictly  speaking,  the  functional  of  the  continous  approach  is  not  convex  either, 
the  convexity  is  enforced  by  a  local  approximation  of  the  gray  values,  and  most  local 
minima  are  avoided  by  a  coarse- to-fine  control  strategy  [NE86,  Ana87,  TWK87, 
DS88]). 

A  more  severe  problem  is  the  open  question  of  how  to  weigh  the  different  parts 
of  the  functional.  Typically  these  coupling  constants  are  determined  heuristically. 
It  has  been  shown  that  the  optimal  choice  depends  on  the  image  content,  and  can 
vary  within  an  image  [Yui87].  In  the  extreme  case  so  many  free  parameters  are 
introduced  that  the  entire  approach  is  led  ad  absurdum,  since  its  elegance  is  based 
upon  the  idea  of  reducing  a  complex  problem  to  a  simple  principle.  The  approach 
can  only  be  succesful  when  these  parameters  can  be  found  from  apriori  principles 
and  the  image  data  themselves.  Whether  the  proposed  min-max  principle  rGY89], 
where  the  functional  is  maximized  with  respect  to  the  coupling  parameters  and 
minimized  with  respect  to  the  displacements,  is  adequate  for  real  image  sequences, 
is  an  open  question,  especially  with  space-variant  parameters.  Also,  the  computa¬ 
tional  costs  are  immense,  because  for  every  configuration  of  parameters  the  entire 
minimization  problem  has  to  be  solved. 

1.1.3  Discontinuities  without  full  Displacement  Vector  Field 

This  approach  also  separates  the  two  problems,  the  discontinuities  are,  however, 
detected  first.  When  this  segmentation  problem  is  solved,  then  the  computation 
of  the  displacements  becomes  simple,  often  just  a  least  squares  estimation  or  the 
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minimization  of  a  simple  quadratic  functional.  This  segmentation  is  also  a  pre¬ 
condition  to  the  direct  determination  of  3D  structure  or  motion  [TH84,  NH85]. 
It  may  seem  to  be  an  impossible  task  to  estimate  the  discontinuities  before  the 
vector  field  itself,  but  there  is  already  encouraging  evidence  that  such  informa¬ 
tion  can  indeed  be  found.  One  approach  in  this  direction  is  the  analysis  of  local 
histograms  of  possible  displacements  and  the  search  for  “vanishing  bars”  [SU87], 
and  another  is  the  detection  of  significant  non-vanishing  divergence  of  the  local 
non-iteratively  determined  displacement  vector  field  [NA89].  While  this  type  of 
approach  is  appealing  due  to  its  noniterative  character,  these  results  are  at  best 
preliminary,  and  a  precise  analysis  of  their  reliability  is  missing.  These  approaches 
indicate,  however,  that  it  is  indeed  possible  to  extract  the  important  information 
about  discontinuities  at  a  very  early  stage  of  the  vision  process. 

1.2  Observations  which  motivate  a  new  scheme 

For  the  rWelopernent  of  a  new  scheme  to  the  detection  of  displacement  disconti¬ 
nuities  and  the  segmentation  of  motion  fields  the  following  observations  will  be  of 
importance: 

•  The  vector  field  determined  by  a  regularization  approach  such  as  the  “optical 
flow”  or  a  gaussian  weighted  least  squares  fit  is  more  or  less  correct  in  the 
interior  of  a  displaced  region,  apart  from  the  distorting  effects  discussed  in 
[VP87].  The  size  of  the  region  determines  how  much  smoothing  of  the  vector 
field  is  appropriate.  Within  these  limits  the  noise  is  better  suppressed  and 
thus  the  vector  field  is  more  reliable,  if  more  smoothing  is  applied.  The  vector 
field  changes  dramatically,  however,  when  the  smoothing  is  too  strong. 

•  In  the  interior  of  a  region  the  measured  local  data  are  compatible  with  the 
regularized  vector  field.  The  only  deviations  are  due  to  noise. 

•  Near  the  occlusion  boundaries  the  estimates  of  the  regularized  fields  devi¬ 
ate  significantly  from  the  actual  displacement.  Near  occlusion  boundaries 
a  less  regularized  vector  field  reflects  the  actual  displacement  better  than  a 
smoother  vector  field,  while  at  the  same  time  it  is  more  sensitive  to  noise.  In 
particular  the  measured  data  themselves  are  significantly  incompatible  with 
the  regularized  vector  field.  The  consequence  of  this  is  that  close  to  occlu¬ 
sion  boundaries  essentially  all  members  of  a  family  of  regularized  solutions 
are  incompatible  with  each  other. 

•  An  observation  made  by  David  Marr  is  that  discontinuities  of  any  given 
dimension  are  relatively  sparse  and  they  are  continuous  in  the  next  lower 
dimension  [Mar82].  This  means  that  discontinuites  of  2D  regions  are  essen¬ 
tially  continous  lines.  These  lines  can  have  again  relatively  few  directional 
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discontinuities  in  the  form  of  endstops  and  breakpoints.  These  are,  however, 
not  considered  in  this  paper,  but  a  smooth  and  closed  occlusion  boundary  is 
assumed. 

•  If  there  is  an  actual  boundary,  it  shows  up  at  multiple  scales.  This  was 
formulated  by  David  Marr  as  the  spatial  coincidence  assumption  [Mar82]. 

•  The  detection  of  boundaries  due  to  motion  alone  is  instantaneous  for  the 
human  visual  system.  This  can  be  easily  found  out  with  a  moving  random 
dot  pattern.  Therefore  there  must  be  a  fast  mechanism  with  (at  most)  a  few 
iterations  to  find  these  discontinuities. 

As  a  consequence  of  these  observations  the  following  strategy  is  followed. 


1.3  General  strategy 

At  the  beginning  of  the  process  a  family  of  regularized  displacement  vector  fields  is 
determined.  According  to  [VP87]  it  is  not  critical  which  particular  regularization 
method  is  applied.  Therefore  a  computationally  cheap  noniterative  approach  of  lo¬ 
cal  weighted  least  squares  estimates  is  followed.  With  this  approach  the  calculation 
of  the  family  of  regularized  vector  field  can  be  decomposed  into  the  construction  of 
a  kind  of  Gaussian  pyramid  of  moments  and  a  pointwise  calculation  at  each  node  of 
the  pyramid.  With  this  procedure  not  only  the  actual  displacement  estimates  but 
also  the  sum  of  squared  deviations  can  be  determined.  Each  level  of  this  pyramid 
represents  one  level  of  regularization,  with  the  measured  data  at  the  bottom,  and 
the  global  average  displacement  at  the  top. 

The  most  critical  part  of  the  procedure  is  to  calculate  the  compatibilites  be¬ 
tween  the  different  levels  of  the  pyramid.  It  will  be  shown  that  the  traditional 
Euclidian  distance  between  properties  corresponds  to  the  difference  of  the  sums  of 
squared  deviations,  when  only  mean  values  are  estimated.  For  comparing  regres¬ 
sion  coefficients  as  in  the  case  of  displacement  information  euclidian  distances  are 
not  suitable. 

The  difference  of  sums  of  squared  deviations  with  respect  to  the  displacement 
model  is  a  candidate  for  a  general  compatibility  measure,  with  the  nice  properties  of 
a  metric.  There  are  two  drawbacks,  however.  First  there  is  no  inherent  interpreta¬ 
tion  in  this  measure,  which  also  means  that  for  quantitative  use  a  heuristic  scaling 
parameter  needs  to  be  introduced.  This  occurs,  wherever  the  sums  of  squared 
deviation  or  euclidian  distances  are  used  as  compatibility  measures.  Secondly  it  is 
impossible  to  use  the  sums  of  squared  deviations  for  comparing  different  models 
which  all  explain  the  data.  In  motion  interpretation  this  could,  for  example,  be 
pure  translation  versus  translation  with  rotation  and  scaling.  The  sums  of  squared 
deviations  is  a  monotonously  decreasing  function  of  the  number  of  parameters,  and 
it  vanishes  when  there  are  as  many  parameters  as  observations. 
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In  order  to  overcome  these  two  drawbacks,  the  criterion  of  minimal  description 
length  (MDL)  can  be  used  as  a  compatibility  measure  to  compare  displacement 
models  at  different  levels  of  the  pyramid.  Although  this  compatibility  measure  is 
not  a  metric,  it  has  a  probabilistic  interpretation  and  it  can  be  used  to  compare 
models  of  different  complexity.  Even  when  only  a  single  model  is  used,  compati¬ 
bility  defined  by  the  MDL  criterion  is  less  sensitive  to  structural  variations  in  the 
images  than  the  other  measures. 

The  noisy  evidence  of  each  individual  scale  leads  to  the  problem  of  combining 
evidence  from  multiple  locations  and  scales.  Due  to  the  weak  contrast  of  local 
measurements  of  displacement  discontinuities,  the  “continuity  of  discontinuities” 
and  the  “spatial  coincidence  assumption”  [Mar82]  serve  as  guiding  principles,  al¬ 
lowing  displacement  discontinuities  only  if  there  is  combined  evidence  from  the 
neighborhood  and  from  several  scales. 

A  pyramidal  structure  with  the  local  links  between  its  levels  opens  the  possi¬ 
bility  of  constructing  a  model  that  makes  use  of  these  constraints. 

1.4  Outline  of  the  paper 

The  next  section  discusses  the  minimum  description  length  approach,  in  particular 
with  its  application  to  the  regression  problem. 

In  section  3  the  question  of  compatibility  of  two  sets  of  data  with  respect  to  a 
given  model  is  discussed.  The  displacement  estimation  problem  is  described  as  a 
regional  regression  problem.  Three  different  compatibility  measures  are  presented. 

Section  4  describes  the  actual  scheme  where  the  individual  steps  are  described 
and  illustrated  with  an  example  of  a  random  dot  stereogram.  At  the  end  the  result 
of  applying  the  algorithm  to  a  real-world  image  pair  from  a  sequence  is  shown. 

Section  5  concludes  the  paper  with  a  discussion  of  the  approach  and  an  outlook 
to  possible  extensions. 


2  The  Minimum  Description  Length  Approach 

The  extension  of  the  widely  applied  maximum  likelihood  principle  is  motivated  by 
the  question  what  an  overall  best  model  for  a  given  set  of  data  is.  In  order  to 
have  a  clear  definition  of  what  a  “best”  model  is,  not  only  the  goodness  of  fit  of  a 
model,  but  also  the  complexity  of  the  model  itself  have  to  be  considered.  In  this 
sense,  the  “best”  model  is  the  simplest  that  can  explain  a  given  set  of  data.  While 
this  as  a  general  statement  has  been  known  at  least  since  “Occam’s  Razor”  and 
has  been  a  philosophical  guideline  in  the  developement  of  the  natural  sciences,  it 
is  far  from  trivial  to  formalize  this  principle  in  such  a  way  that  it  is  applicable  to 
determine  a  specific  model. 
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2.1  General  background  of  minimum  description  length 

The  foundations  of  a  formalization  were  laid  in  information  theory,  by  the  intro¬ 
duction  of  an  algorithmic  definition  of  information  [Sol64,  Kol65,  Cha66],  which 
defines  the  information  content  of  a  string  by  the  minimal  length  of  a  program  to 
describe  this  string.  A  limitation  to  the  application  of  this  idea  was  already  set 
by  one  of  its  inventors,  stating,  that  no  program  can  automatically  generate  such 
a  program  or  description  of  minimal  length  [Kol65].  This  essentially  rules  out  a 
fully  automated  process  of  general  induction.  It  does  not  rule  out,  however,  the 
testing  of  any  number  of  models — invented  by  human  ingenuity — by  this  criterion. 
This  latter  approach  mirrors  the  philosophy  of  critical  rationalism  widely  adopted 
in  the  field  of  science  [Pop59]. 

The  practical  applications  and  further  theoretical  justifications  of  this  concept 
were  developed  independently  by  C.S.  Wallace  and  his  coworkers  [WB68,  BW73, 
WB75,  GW84,  WF87]  and  by  J.  Rissanen  [Ris76,  Ris78,  HR82,  Ris83,  Ris86, 
Ris87].  Both  envisage  primarily  the  statistical  problem,  how  to  choose  an  optimal 
set  of  parameters  within  a  given  model  class.  The  length  of  the  description  of 
the  parameters  is  then  interpreted  as  the  negative  logarithm  of  a  Bayesian  prior, 
and  the  best  model  is  found  by  maximizing  the  product  of  conditional  and  prior 
probabilities.  Putting  it  another  way,  the  optimal  model  is  selected  by  minimizing 
the  sum  of  the  negative  binary  loglikelihood  of  the  data  given  the  model,  and  the 
length  of  the  shortest  code  to  describe  the  parameters.  Although  the  approach 
appears  to  be  formally  Bayesian,  there  are  fundamental  differences.  The  most 
prominent  difference  is  that  the  relevant  distribution  for  the  MDL  concept — the 
“marginal  prior  distribution”  [WF87]  or  the  “stochastic  complexity”  [Ris87]-is  a 
distribution  on  the  data,  and  not  on  a  model. 

One  key  question  to  make  the  approach  feasible  is,  how  to  encode  and  there¬ 
fore  how  to  optimally  truncate  the — typically  real-valued — parameters.  The  an¬ 
swer  was  found  by  both  researchers  independently,  although  the  resulting  formulas 
look  rather  different,  and  appear  to  be  partly  inconsistent.  The  parameters  are 
truncated  in  such  a  way,  that  the  truncation  error  equals  the  statistical  estimation 
error  of  the  model  (see  e.g.  [Ris83,  WF87]). 

2.2  Minimum  Description  Length  of  Regression 

One  important  motivation  to  apply  the  principle  of  Minimum  Description  Length 
to  the  regression  problem  is  related  to  the  question  of  the  best  choice  of  regressor 
variables.  Specifically  when  modelling  a  ID  signal  or  a  2D  surface  with  piecewise 
polynomials  naturally  one  has  to  somehow  determine  the  optimal  degree  of  the 
elements.  Interweaved  with  this  is  the  problem  of  finding  the  optimal  breakpoints 
in  ID  or  the  region  boundaries  in  2D.  Both  problems  can  be  formulated  in  terms 
of  a  descriptive  language.  By  minimizing  the  length  of  the  code  in  this  language 


the  optimal  description  is  found.  The  gray  value  segmentation  problem  has  been 
treated  in  this  spirit  by  Y.  Leclerc  [Lec89]. 

The  main  emphasis  here  is  laid  on  measuring  the  statistical  compatibility  of 
two  sets  of  data  with  respect  to  a  given  model.  This  also  means  to  find  the 
optimal  degree  of  a  polynomial.  The  partitioning  problem  is  here  approached  in 
a  way  that  is  based  on  fundamental  insights  about  discontinuities  in  the  visual 
input  data,  thereby  avoiding  the  difficult  and  time-consuming  minimization  of 
functionals  that  include  explicit  boundary  models. 

2.2.1  Selection  of  variables 

Estimating  the  best  degree  of  a  polynomial  for  a  given  set  of  data  points  is  a  spe¬ 
cial  case  of  the  general  selection  of  variables  problem.  The  model  likelihood  with 
the  maximum  likelihood  estimates  of  the  regressor  variables  naturally  increases 
with  the  number  of  variables  used.  Therefore  the  otherwise  powerful  maximum 
likelihood  principle  fails  to  give  an  answer  to  the  important  question,  which  vari¬ 
ables  explain  the  observed  data  best.  The  description  length  approach  extends 
the  maximum  likelihood  principle  by  including  a  term  that  measures  the  model 
complexity. 

2.2.2  A  simple  measure  of  model  complexity 

An  intuitive  approach  to  capture  the  notion  of  complexity  is  the  following  [Foe89]. 
When  the  variables  explain  the  data  optimally,  then  the  residual  variance  is  entirely 
due  to  noise.  Assuming  Gaussian  noise,  the  standard  deviation  of  the  noise  is 
proportional  to  y/ra,  with  n  the  number  of  observations.  A  “good”  variable,  which 
“explains”  the  data,  should  therefore  reduce  the  residual  standard  deviation  by 
more  than  \fn.  Thus  a  good  term  to  express  the  cost  of  k  parameters  is 

2  loS2n 

The  coding  length  interpretation  of  this  term  is  that  in  the  presence  of  Gaussian 
noise  for  the  required  precision  each  parameter  needs  to  be  encoded  with  |  log2  n 
bit.  The  complete  description  length  L(x\9)  for  a  model  with  the  parameter  vector 
9  and  the  n  observations  x  therefore  is 

L(x\9)  =  -log2p(z|0)  +  ^log2n  (1) 

This  criterion  has  been  derived  by  J.  Rissanen  [Ris78]  and  independently  by  G. 
Schwarz  [Sch78],  who  derived  it  for  the  exponential  family  of  distributions  without 
referring  to  description  length. 

Despite  its  apparent  simplicity,  for  large  enough  n  the  term  |  log2  n  can  be 
proven  to  represent  the  asymptotically  optimal  model  complexity  [Ris89]. 
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2.2.3  Minimum  description  length  for  multiple  regression 

For  a  finite  number  of  observations,  however,  another  approximation  of  the  de¬ 
scription  length  is  needed.  This  criterion  has  been  independently  derived  by  C.S. 
Wallace  [WF87]  and  J.  Rissanen  [Ris87].  Let  the  general  linear  model  of  multiple 
regression  be  of  the  form 

k 

Hi  =  £  9ij  *  (2) 

j= i 

which  is  in  matrix  notation 

p  =  G-9  (3) 

where  G  represents  the  set  of  regressor  variables  and  9  is  the  parameter  vector  to 
be  determined.  The  n  measurements  x,  are  assumed  to  be  independent  samples 
from  a  normal  distribution 

p(x\fi )  =  e-1  (4) 

The  least  squares  estimator  9*  of  the  parameter  vector  9  is  determined  by  mini¬ 
mization  of  the  sum  of  squared  deviations  5  with  respect  to  9: 

S  =  (x  -  fi)T (x  -  fi)  (5) 

9*  =  (GtG)~1Gtx  (6) 

Therefore  the  optimized  sum  of  squared  deviations  S*  is 

5*  =  xrx  -  9*tGtG9*  (7) 

and  the  probability  at  the  least  squares  estimate  9*  is 

/  5*\ "? 

p(x  —  G  -  9*)  ~  \  2ire — J  (8) 

According  to  both  authors  the  generalized  description  length  becomes 

L(x)  =  |  log2  27re  +  |  log2  ^  log2  \GTG\  (9) 

The  first  two  terms  are  the  negative  log  of  the  probability  in  equation  8.  The  last 
term,  the  log  of  the  determinant  of  the  curvature  of  the  log- likelihood  function 
at  its  maximum,  represents  the  complexity  of  the  parameters.  The  interpretation 
in  terms  of  description  length  is,  that  the  curvature  of  the  negative  log-likelihood 
function  provides  a  yardstick  by  which  the  precision  of  the  parameters  is  deter¬ 
mined.  The  parameters  are  considered  to  be  of  optimal  precision,  when  the  error 
caused  by  truncation  of  the  parameters  is  the  same  as  the  statistical  error  of  the 
measurement. 
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It  needs  to  be  mentioned  that  this  formulation  of  the  description  length  is  not 
invariant  with  respect  to  the  scale  of  the  data.  It  also  becomes  undefined,  when 
the  so  called  design  matrix  GTG  becomes  singular,  which  can  easily  happen  in  a 
local  analysis  of  image  properties.  In  the  framework  of  “Stochastic  Complexity” 
there  is,  however,  a  formulation  of  the  description  length,  which  is  always  defined 
and  which  is  completely  scale  invariant  [Ris89].  It  is  briefly  described  in  Appendix 
D.  From  there  it  can  be  seen  that  equation  9  is  an  approximation  to  the  exact 
description  length. 


3  A  model  based  homogenity  criterion 

Here  a  criterion  of  homogenity  is  developed  that  is  based  on  the  minimal  descrip¬ 
tion  length  of  data  with  respect  to  a  model.  It  is  shown  to  be  a  logical  extension  of 
euclidian  distance  used  in  gray  value  segmentation  and  the  sum  of  squared  devia¬ 
tions  applied  as  a  merging  criterion  in  the  context  of  hierarchical  Cluster  Analysis 
and  traditional  Split-and  Merge  algorithms. 

In  order  to  show  the  correspondence  to  existing  models  for  gray  value  segmen¬ 
tation,  the  simple  case  of  the  ordinary  gray  value  segmentation  problem  is  analysed 
first  with  a  model  based  view.  Then  the  ID  displacement  estimation  with  a  locally 
constant  vector  field  is  discussed.  It  will  be  shown  that  this  extends  easily  to  2D 
and  to  more  general  and  complex  models. 

3.1  Homogenity  for  gray  value  segmentation 

The  model  of  the  image  /  is  a  set  of  regions  Rk  with  constant  gray  values  within 
the  regions,  each  region  consisting  of  raj,  pixels  with  the  gray  value  X{.  With  a 
Gaussian  noise  model  the  mean  values  and  the  variances  crk  are  a  sufficient 
statistics  to  describe  each  region: 


?({*»} W)  =  n — 7^=  • e  7 

itR„  *k  •  v2tt 


= 

l  .iy 

{«>  ■  v^r 

(10) 

The  parameters  of  interest,  the 
deviations  S*  are  determined: 

estimated  mean  value  n*k  and  the  sums 

of  squared 

II 

I- 

M 

H 

(ID 

sk 

=  Y  (*<  - 

(12) 

itRk 
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=  n  z,2  -  nk  • 

icRk 

From  the  definition  of  S*  it  is  obvious  that  it  grows  monotonously  with  each 
additional  Xj  that  is  added  to  the  region.  Therefore  it  is  possible  to  define  a 
feature  distance  measure  between  a  region  Rk  and  an  Xj  as  the  increase  of  Sk 
when  Xj  is  included  in  Rk.  This  generalizes  to  the  distance  between  two  regions 
Rk  and  Ri: 

D(k,l)  =  Skui  -  Sk  -  St  (13) 

where  Skul  are  the  sum  of  squared  deviations,  when  Rk  and  Ri  are  joined.  Another 
view  of  this  distance  measure  is  the  decrease  of  Skul  when  a  member  or  a  subregion, 
say  Rk,  of  region  Rkul  is  taken  out  of  that  region.  Trivially  the  sum  of  squared 
deviations  of  a  single  item  is  0. 

With  the  simple  model  discussed  here,  the  Sk  don’t  have  to  be  calculated 
explicitly  in  order  to  determine  the  distance  measure.  As  can  be  easily  shown,  the 
distance  D{k,l )  between  two  regions  Rk  and  Ri  becomes 

d(m)  = •  w;  - /*;)’  (i4) 

7lfc  +  Til 

This  relates  the  introduced  distance  measure  to  the  traditional  Euclidian  dis¬ 
tance  between  mean  values  [Pav82j,  the  only  difference  being  the  weighting  term, 
which  depends  on  the  size  of  the  regions  involved. 

3.2  Homogenity  of  displacement  vector  fields 

The  homogenity  measure  is  now  generalized  to  stereo  and  motion  models.  The  first 
model  is  a  regionally  constant  displacement  Uk  and  the  ID  “images”  are  locally 
approximated  with  first  order  polynomials.  The  intuitive  understanding  of  this 
model  is  that  the  intensity  differences  between  the  two  images  are  “explained”  by 
the  displacement,  when  a  locally  planar  image  model  is  assumed.  The  expression 
to  be  minimized  with  respect  to  u*  is  (See  details  in  Appendix  A) 

Sk  =  X!  wi(hi  ~  uk  '  9i)2  (15) 

iejtk 

with  h{  the  regularized  intensity  difference  at  location  i  and  <7,  the  regularized 
estimate  of  the  local  gradient  of  the  average  of  both  images  at  location  i.  The 
Wi  are  the  weights  of  the  different  locations  within  the  support  Rk .  The  weighted 
least  squares  estimate  for  Uk  is 


*  _  Si tRk  lUjhjQi 

U"~ 


(16) 
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Therefore  the  sum  of  squared  deviations  of  region  k  at  the  optimal  estimated  value 
u*k  becomes 


Sk 


Y  Wi(hi  - 

itRk 

Y  Wifli  ~ 

itRk 


<  ■  9i)2 

{HitRk  wihi9i f 
ZitRk  Wi9i 


(17) 


With  this  residual  sum  of  squares  the  same  kind  of  distance  measure  D(k,  l )  as  in 
equation  13  can  be  generated. 

Although  the  sum  of  squared  deviations  is  formally  suitable  as  a  compatibility 
measure,  it  has  no  obvious  interpretation  nor  any  invariance  properties.  In  partic¬ 
ular  it  is  a  monotonously  decreasing  function  of  the  number  of  parameters,  and  it 
vanishes  when  there  are  as  many  parameters  as  data  points. 

A  better  measure  is  the  negative  log-likelihood  at  the  maximum  likelihood 
estimate 


Mfc  =  nfclog  (— )  (18) 

Vnfc/ 

This  has  a  probability  interpretation  for  a  given  model,  and  in  the  context  of  the 
Neyman-Pearson  test  it  has  been  successfully  used  for  gray  value  region  segmen¬ 
tation  [Yak75]. 

The  likelihood  also  does  not  take  into  account  the  complexity  of  a  model, 
however.  Therefore  by  choosing  a  sufficiently  complex  model  the  likelihood  can  be 
made  as  large  as  needed.  Only  the  minimum  description  length  makes  it  possible 
to  compare  models  of  different  complexity  in  a  consistent  way. 

As  described  in  the  last  section,  the  part  of  the  description  length  of  a  region 
that  is  not  related  to  the  geometry,  is  essentially 

DLk  =  y  •  log2  (”)  +  2  "  (19) 

where  C*  is  the  design  matrix  of  the  statistical  model.  In  the  simple  ID  flow  model 

it  is  T,ieRk9i- 

The  actual  compatibility  calculation  is  very  simple.  Suppose  there  are  two 
neighbouring  regions  Rk  and  Rj.  When  an  item  or  a  region  j  that  is  not  yet  part 
of  Rk  is  inserted  into  Rk,  then  the  statistical  part  of  the  description  length  changes 
by  the  amount 

A DL(k,j)  =  DLkuj  -  DLk  -  DLj  (20) 

If  region  j  is  to  be  compared  with  different  regions,  then  the  compatibility  measure 
can  be  simplified  due  to  the  constant  contribution  of  region  j 

D(j,k)  =  DLkuj-  DLk  (21) 
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This  “pseudo-distance”  (it  is  not  symmetric  and  may  well  be  negative)  can  now 
be  converted  into  a  relative  assignment  probability  Q(j,  k )  to  a  region  k  of  a  set 
K : 

9 -D(j,k) 

=  ZveK  ^22) 

This  obviously  does  not  include  the  probability,  that  j  may  not  belong  to  any  of 
the  possible  groups. 


4  Description  of  the  complete  scheme 

4.1  The  pyramidal  architecture  of  the  scheme 

A  key  to  the  whole  concept  is  a  pyramidal  architecture,  which  can  represent  both 
a  family  of  regularized  model  estimates  as  well  as  a  partition  of  the  image  with 
piecewise  homogeneous  regions  and  sharp  discontinuities  between  them.  This  ar¬ 
chitecture  has  first  been  used  for  gray  value  segmentation  and  the  applied  algo¬ 
rithm  has  become  known  as  pyramidal  linking  [BHR81,  PR81,  Bur84].  Although 
the  pyramidal  linking  algorithm  applied  there  has  proven  to  be  inadequate  for 
displacement  vector  field  segmentation,  the  underlying  pyramidal  architecture  will 
be  used  to  develop  a  new  scheme. 

The  architecture  of  this  pyramid  can  best  be  explained  in  one  dimension.  The 
extension  to  two  (or  more)  dimensions  can  be  done  in  a  straightforward  way  by 
applying  the  ID  concept  in  each  dimension.  Fig.  1  shows  the  graph  of  this  pyramid 
with  16  nodes  at  the  bottom  level,  representing  the  primary  image  data. 

The  key  property  of  this  pyramid  is  that  each  node  of  a  lower  level  (=  son  node) 
is  connected  to  two  nodes  (4  in  2D)  at  the  next  higher  level  (=  father  nodes),  except 
at  the  image  boundary,  where  each  son  node  is  connected  to  only  one  father  node. 
Each  father  node  has  links  to  4  son  nodes  (16  in  2D),  which  define  the  maximal 
support  for  any  estimation  at  the  father  node.  The  links  are  represented  by  weight 
factors  w(i,j)  between  0  and  1,  linking  son  node  i  to  father  node  j. 

Through  these  links  information  can  be  propagated  in  both  directions.  The 
bottom-up  propagation  of  signal  properties  P-m’  from  pyramid  level  (m)  to  level 
(m  +  1)  is  by  weighted  summation: 

pjm+l)=  £  w(i,j)p}m>  (23) 

i€son(j) 

The  top-down  propagation  from  level  (m  +  1)  to  level  ( m )  is  analogous: 

P'”'  =  £  (24) 

j^father(i) 
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Figure  1:  Architecture  of  the  pyramid  structure  used  in  ID 


For  normalization  purposes,  the  sum  of  the  weight  factors  is  constrained 

52  =  1  VJ  (25) 

t€son(j) 

This  means  that  each  son  node  contributes  with  the  total  weight  of  1  to  the  next 
level  of  the  pyramid.  When  links  to  father  nodes  are  cut  completely  in  order  to 
define  a  region,  then  obviously  all  weights  are  0.  This  aspect  will  only  be  dealt 
with  in  such  way  here,  that  a  maximum  pyramid  level  is  chosen  a  priori.  All  nodes 
at  this  level  are  assumed  to  define  regions. 

Both  these  processes  are  in  general  space-variant  linear  operators,  and  they 
can  be  interpreted  in  such  a  way,  that  each  father  node  represents  a  region  of 
those  son  nodes  with  non-zero  weight  factors.  In  the  general  case  these  regions  are 
overlapping. 

With  the  necessary  properties  propagated  up,  a  weighted  least  squares  estimate 
can  be  obtained  for  the  region  corresponding  to  a  father  node.  In  the  simplest 
case  of  a  model  with  regionally  constant  gray  values,  the  grayvalues  themselves 
and  the  number  of  contributing  pixels  (one  at  the  bottom  level)  are  propagated 
up  according  to  equation  23,  recursively  from  level  to  level,  up  to  the  top  level 
of  the  pyramid,  which  consists  of  only  one  node.  At  each  node  of  this  pyramid 
a  weighted  mean  value  can  be  estimated,  which  corresponds  to  the  image  region 
that  has  non-zero  links  to  this  node. 

When  each  father  node  has  the  same  set  of  binomially  distributed  weights,  then 
a  kind  of  Gaussian  pyramid  is  created.  When  the  weights  are  strictly  0  or  1,  then 
the  image  is  segmented  into  non-overlapping  regions,  and  the  weights  define  the 
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Figure  2:  Graph  of  a  complete  segmentation  in  ID 


partition. 

The  top-down  propagation  process  can  best  be  understood  when  the  image  is 
fully  segmented,  so  that  each  son  node  has  exactly  one  father  node,  to  which  it 
is  linked  with  a  non-zero  weight.  The  corresponding  graph  is  a  tree,  indicated  in 

Fig.  2. 

In  this  case  the  top-down  propagation  is  particularly  simple.  Starting  from 
the  root  nodes  of  the  tree,  which  represent  the  region  properties,  according  to 
equation  24  each  son  node  inherits  these  properties  from  its  father  node.  Therefore 
the  properties  of  the  whole  region  are  distributed  to  all  contributing  pixels,  which 
is,  for  example,  relevant  to  show  the  result  of  a  segmentation. 

The  other  extreme  is  the  situation  with  those  space-invariant  weights,  by  which 
the  Gaussian  pyramid  was  created.  Then  the  top-down  propagation  from  a  certain 
level  of  the  pyramid  effectively  means  a  smoothing  operation  of  the  properties  that 
were  propagated  up  from  the  bottom  level.  The  higher  the  referred  level  of  the 
pyramid,  the  larger  the  effective  filter  size. 

The  segmentation  task  can  be  formulated  within  this  pyramidal  architecture. 
It  is  assumed  that  the  image  consists  of  reasonably  large  coherent  regions.  Within 
these  regions  the  properties  are  assumed  to  be  homogeneous.  As  a  consequence 
the  linking  weights  in  the  interior  of  regions  can  be  assumed  to  be  arbitrarily 
distributed  as  long  as  the  corresponding  father  nodes  also  belong  to  the  same 
region.  In  particular  the  weights  can  be  binomially  distributed  with  respect  to 
a  particular  father  node,  which  is  the  initial  weight  configuration  of  the  scheme. 
The  critical  parts  of  the  image  are  its  discontinuities.  Discontinuities  are  locations 
where  the  linking  weights  from  neighboring  son  nodes  should  be  non-zero  only 
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Figure  3:  Random  dot  stereo  pair 

in  the  direction  pointing  away  from  the  discontinuity  towards  the  interior  of  the 
region.  This  applies  even  in  the  case  when  the  region  is  represented  by  just  one 
father  node.  On  the  other  hand,  if  evidence  can  be  found  that  the  links  from  two 
neighboring  son  nodes  should  point  in  opposite  directions,  a  local  hypothesis  of 
a  discontinuity  can  be  made.  The  segmentation  task  thus  consists  of  gathering 
evidence  about  potential  discontinuites  by  analysing  the  initial  pyramid  which 
has  space-invariant  binomially  distributed  linking  weights,  making  this  evidence 
consistent  within  and  between  the  pyramid  levels,  and  finally  changing  the  weights 
in  such  a  way  that  a  segmented  tree  or  at  least  something  close  to  it  results. 

For  the  purpose  of  illustration  an  example  of  a  random  dot  stereo  pair  will  be 
used  to  show  the  different  stages  of  the  method.  The  original  pair  is  displayed  in 
Fig  3.  In  the  center  of  the  second  image  a  square  is  displaced  by  2  pixels.  Both 
images  also  contain  random  noise. 

4.2  Initialization 

At  the  beginning  of  the  process  a  family  of  regularized  displacement  estimates  is 
constructed. 

The  basic  procedure  is  demonstrated  with  the  simple  model  of  locally  constant 
ID  displacements.  This  model  is  described  above  in  section  3.2  (see  also  Appendix 
A).  As  with  any  least  squares  model,  the  estimator  as  well  as  the  minimal  sum 
of  squared  deviations  can  be  calculated  by  summing  the  required  moments  over 
the  region  of  interest  and  performing  a  few  scalar  operations.  Given  the  set  of 
moments  at  each  pixel  any  degree  of  regularization  can  be  achieved  by  smoothing 
all  the  moments  before  calculating  the  desired  properties  with  scalar  operations  at 
each  level  of  smoothness.  Therefore  by  creating  the  Gaussian  pyramid  described 
above  with  all  the  moments  a  family  of  increasingly  regularized  vector  fields  is 
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available. 

Taking  the  local  model  of  constant  ID  flow  in  an  image  with  size  pixels 
(for  simplicity  only  one  index  is  used  here)  from  equations  16  and  17  it  follows 
that  for  u'k  and  for  5*  the  following  four  moments  are  required: 

\URk  J  \icR„  J  \itRk  /  / 

This  means  that  the  elements 

=  {l,hlh,g„9>}  (i  =  l.  ..»<»>)  (26) 

form  the  elements  of  the  bottom  level  of  the  pyramid. 

At  the  initialization  stage  a  family  of  fully  regularized  solutions  to  the  dis¬ 
placement  estimation  problem  is  created.  This  means  that  the  moments  of  the 
successive  levels  of  the  pyramid  are  formed  by  bottom-up  propagation  of  the  mo¬ 
ments  ot  their  16  son  nodes  at  the  level  below  (see  equation  2'6).  These  are  the  4 
closest  nodes  in  each  dimension.  The  weights  w(i,j)  define  the  support  over  which 
the  least  squares  model  is  estimated  at  level  j.  The  weighting  factors  are  such 
that  each  son  node  distributes  its  information  among  its  4  father  nodes  according 
to  equation  25.  This  guarantees  that  the  total  sum  of  contributing  data  elements 
is  constant  at  each  level  of  the  pyramid.  The  initial  weights  of  the  16  son  nodes 
are  binomially  distributed  in  each  dimension,  which  is  the  optimal  4  x  4-pixel  ap¬ 
proximation  to  a  2D  Gaussian  distribution.  This  optimizes  the  localization  of  the 
model  properties  at  the  coarser  levels  of  the  pyramid. 

Fig  4  shows  the  family  of  regularized  displacement  fields.  As  there  is  only  a  one 
dimensional  displacement,  the  scalar  fields  are  gray  level  encoded.  The  different 
levels  of  the  pyramid  show  displacement  fields  based  on  binomially  weighted  least 
squares  with  an  increasing  support. 


4.3  Measurement  of  compatibility 

Due  to  the  way  the  pyramid  is  constructed  it  can  be  interpreted  as  a  graph,  where 
each  son  node  is  connected  to  4  father  nodes  (in  2D)  at  the  next  higher  level  of  the 
pyramid,  which  represent  4  directions.  This  representation  has  been  used  before 
for  gray  value  and  texture  segmentation  [BHR81,  PR81,  Bur84],  but  the  algorithm 
described  there  is  not  adequate  for  the  segmentation  of  displacement  vector  fields. 

By  measuring  the  compatibility  to  all  4  father  nodes,  there  are  three  possible 
outcomes: 

1.  There  is  approximately  the  same  compatibility  to  all  father  nodes,  possibly 
with  minor  variations.  This  means  that  the  son  node  is  in  the  interior  of  a 
homogeneous  region. 
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Figure  4:  Family  of  regularized  displacement  estimates 
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2.  There  is  a  significant  preference  towards  a  certain  direction.  This  is  an  indi¬ 
cation  that  the  son  node  is  close  to  a  displacement  discontinuity,  especially 
when  the  corresponding  neighbor  node  shows  a  preference  in  the  other  di¬ 
rection.  Significance  in  this  context  has  the  interpretation  that  the  creation 
of  a  piece  of  occlusion  boundary  is  “cheaper”  in  terms  of  coding  length  than 
the  enforcement  of  the  model  across  this  boundary. 

3.  There  is  total  incompatibility  to  any  of  the  father  nodes.  This  is  an  indication 
that  at  the  pyramid  level  above  there  is  too  much  smoothing  with  respect 
to  this  particular  son  node.  Therefore  this  node  (with  all  nodes  “below”  it) 
represents  an  independent  region. 

Although  the  last  case  is  of  conceptual  interest  and  is  important  when  an  actual 
labelling  of  regions  is  performed,  it  can  in  many  cases  be  ignored  when  the  pyramid 
is  cut  at  a  level  that  contains  more  nodes  than  the  expected  number  of  distinct 
regions.  This  typically  is  a  level  of  4  —  12  pixels  square.  At  the  “computationally 
expensive”  levels  of  the  pyramid  only  the  first  two  alternatives  are  relevant. 

The  compatibility  D{i,j)  betweeen  a  node  i  and  its  father  node  j  is  measured 
by  the  difference  of  description  length  between  node  j  with  the  content  of  i  included 
and  node  j  with  the  content  of  node  i  removed.  Assuming  that  i  contributes  with 
the  weight  w(i,j)  to  j  before  the  measurement,  then  in  analogy  to  equation  21 
is  given  by 


D(i,j)  =  DL(Pjm+1)  +  (1  -  w(iJ))P}m))  -  DL(Pjm+1)  -  w(i,j)Pim))  (27) 


The  description  length  has  an  interpretation  as  the  negative  log  of  a  probability, 
so  D(i,j)  can  be  converted  to  a  normalized  assignment  probability  as  in  equation  22 


Qihj) 


2 -0D(i,j) 


(28) 


The  denominator  makes  sure,  that  the  total  assignment  probability  of  i  to  the  next 
level  is  exactly  1. 

The  introduction  of  /3  becomes  necessary  because  it  is  desirable  to  have  a 
similar  range  of  assignment  probabilities  at  each  level.  This  is  achieved,  when 
(3  is  the  inverse  of  the  number  of  pixels  contributing  to  the  father  node  j.  The 
interpretation  of  this  choice  is  that  the  “per  pixel  description  length”  is  measured. 
As  the  population  in  the  pyramid  increases  by  a  factor  4  with  each  level,  the  choice 
for  (3  is  4-m,  where  m  is  the  pyramid  level  (The  bottom  level  of  the  pyramid  is  0). 


4.4  Regularization  of  boundary  evidence 

After  the  measurement  of  compatibilities  between  the  pyramid  levels  at  each  node 
a  set  of  4  probabilities  is  given,  measuring  the  local  preference  of  each  node  to- 
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Figure  5:  Measured  assignment  probability  field  at  2nd  level 

wards  one  of  4  directions.  These  4  probabilities  are  related  to  the  4  directions 
NW,NE,SW,SE.  They  weigh  the  directional  preference  of  each  node  in  the  pyra¬ 
mid.  This  quadruple  of  directional  preference  can  be  transformed  into  a  vector 
field  which  directly  shows  the  directional  preference  of  a  node  and  the  strength  of 
this  preference.  Associating  the  S-N  direction  with  the  y-axis,  and  the  W-E  direc¬ 
tion  with  the  x-axis,  the  vector  field  components  (Fx,  Fy)  for  directional  preference 
are  determined  by 


Fx  =  (Qne  +  Qse)  —  ( Qnw  +  Qsw)  (29) 

Fy  —  (Q^e  +  Qnw)  —  ( Qse  +  Qsw )  (30) 

The  vector  field  has  only  2  components,  whereas  the  quadruple  representation 
has  3  (the  normalization  determines  the  4th).  This  is  due  to  the  possibility  that 
the  quadruple  could  represent  preferences  that  are  inconsistent  with  a  directional 
interpretation.  These  are,  however  not  considered  as  relevant.  Therefore  it  is 
possible  to  convert  both  representations  into  each  other. 

Fig.  5  shows  this  vector  field  at  the  second  level  of  the  pyramid.  When  the 
vector  field  is  observed,  it  can  be  easily  seen  that  it  is  very  noisy  in  comparison 
to  the  ideal  situation,  where  there  should  only  be  nonzero  vectors  on  both  sides  of 
the  displacement  discontinuities. 

The  discontinuities  appear  as  sinks  in  this  vector  field.  Therefore  a  good  oper¬ 
ator  to  measure  the  strength  of  discontinuities  is  the  divergence  of  this  field.  The 
lines  of  discontinuity  are  to  be  found  where  the  negative  valleys  of  the  divergence 
of  the  directional  preference  field  F  are.  The  negative  part  of  the  divergence  of  the 
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Figure  6:  Measured  boundary  evidence  at  all  levels 

field  at  all  levels  is  shown  in  Figure  6,  indicating  the  measured  boundary  evidence. 
The  sources  of  the  field  have  no  such  meaningful  interpretation,  so  for  clarity  the 
positive  part  of  the  divergence  is  set  to  0. 

As  can  be  seen  from  this  figure,  at  the  lower  levels  of  the  pyramid  the  assignment 
probability  vector  field  is  very  distorted.  By  sufficiently  smoothing  the  vector  field, 
however,  the  errors  are  averaged  out,  but  the  field  preserves  its  qualitative  essential 
properties.  This  can  be  seen  at  Fig  7,  where  the  vector  field  has  been  smoothed 
with  a  5  x  5  Gaussian  filter.  The  different  levels  of  the  pyramid  are  smoothed 
differently.  Obviously  the  lowest  level  has  to  be  smoothed  with  the  largest  filter 
mask.  The  sizes  of  these  masks  have  been  determined  heuristically. 

Smoothing  of  the  probability  quadruples  has  the  same  effect.  While  the  local 
directional  contrast  decreases  the  vector  field  becomes  more  coherent.  The  location 
of  the  sinks  is  also  preserved,  unless  there  is  another  sink  nearby.  In  that  case  the 
two  sinks  will  be  attracted  and  with  more  smoothing  they  will  merge  into  one. 

This  means  that  by  smoothing  the  probability  field,  smooth  and  stable  bound¬ 
ary  evidence  can  be  found.  A  boundary’s  location  will  be  correct,  if  there  is  no 
other  boundary  close  by.  In  other  words,  the  maximal  smoothing  area  should  be 
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Figure  7:  Boundary  evidence  from  smoothed  probability  field 
smaller  than  the  smallest  expected  distance  to  another  boundary. 

4.5  The  spatial  coincidence  constraint 

An  actual  discontinuity  should  show  up  at  the  same  location  at  multiple  scales. 
This  essentially  is  the  spatial  coincidence  assumption  formulated  by  David  Marr 
[Mar82]. 

This  constrains  the  possible  discontinuities  to  those  that  coincide  at  multiple 
scales.  In  the  proposed  approach  the  spatial  coincidence  constraint  is  used  to 
combine  the  evidence  of  several  levels  of  the  pyramid. 

Although  the  smoothing  at  a  single  level  improves  the  coherence  of  the  prob¬ 
ability  field  considerably,  there  are  still  undesirable  fluctuations,  especially  at  the 
highest  level  of  resolution.  This  is  a  natural  consequence  of  constructing  local 
estimates  from  only  few  measured  data  points. 

By  summing  over  the  probabilities  of  several  pyramid  levels,  the  fluctuations 
at  locations  of  no  discontinuity  can  be  expected  to  average  out.  Where  there 
are  discontinuities,  however,  the  probabilities  will  reinforce  each  other  in  their 
directional  preference. 

In  order  to  relate  the  probabilities  of  different  levels,  the  coarse  level  probability 
quadruples  must  be  projected  down  one  or  more  pyramid  levels  according  to  the 
formula,  which  is  analogous  to  equation  24: 

£  w(i, vij  (3i) 

J€father(i) 

This  is  optimally  done  with  the  binomially  distributed  weights  w(i,j)  that  were 
used  to  create  the  initial  pyramid. 

Fig.  8  shows  the  downprojected  divergence  of  the  field  shown  in  Fig.  7.  This 
shows  that  the  downprojection  effectively  performs  another  smoothing  step. 

When  the  smoothed  probability  quadruples  of  the  first  three  levels  are  added  in 
the  above-described  way,  the  resulting  field  has  the  divergence  shown  at  the  bottom 
of  the  pyramid  display  in  Fig.  9.  The  two  intermediate  levels  are  constructed  by 
collecting  evidence  from  the  levels  themselves  and  from  one  level  above.  The  fourth 
level  only  uses  its  own  evidence. 
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Figure  8:  Boundary  evidence  from  downprojected  probability  field 
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Figure  10:  Final  estimate  of  displacement  field 

4.6  Constructing  the  final  model 

As  the  probability  quadruples  express  a  directional  preference,  the  coherent  prob¬ 
ability  field  determined  by  the  regularization  step  and  the  spatial  coincidence  con¬ 
straint  can  now  serve  as  a  new  set  of  new  weights  that  define  the  properties  of  each 
nodes  father  nodes. 

Due  to  the  smoothing  the  “contrast”  of  the  probabilities  has  decreased,  so  that 
they  can’t  be  used  as  new  linking  weight  factors  directly.  Their  qualitative  trend, 
however,  is  correct  in  the  sense,  that  the  directional  preference  goes  to  opposing 
directions  at  the  hypothetical  lines  of  discontinuity.  The  conversion  of  this  trend 
to  actual  linking  weight  factors  is  done  up  to  now  in  a  very  heuristic  way.  The  final 
normalized  probability  quadrupels  are  taken  to  a  high  power,  say  20,  and  the  result 
is  normalized  again.  This  greatly  enhances  the  “contrast”  of  the  probabilities. 

With  these  weights  a  new  pyramid  is  created  according  to  the  bottom-up  prop¬ 
agation  scheme  of  equation  23,  up  to  the  pre-chosen  top  level  of  the  pyramid.  The 
moments  of  this  level  are  then  again  propagated  down  to  the  bottom  level.  When 
the  model  is  actually  determined  at  the  bottom  level,  the  result  can  be  visual¬ 
ized  and  used  for  further  computations.  Fig.  10  shows  this  final  estimation  of  the 
random  dot  example.  Although  this  is  not  yet  a  segmentation  in  the  strict  sense, 
it  is  obvious,  that  the  displacements  have  a  clear  discontinuity  around  the  black 
square. 

4.7  A  real  example 

Finally  the  method  is  tested  on  a  real  image  sequence.  The  two  frames  used 
from  this  sequence  are  displayed  at  Fig.  11.  One  can  see  that  the  can  is  on  a 
table.  Although  this  is  an  example  of  2D  flow,  the  model  of  which  is  described  in 
Appendix  B,  the  vertical  component  of  the  displacement  is  so  small  that  it  can  be 
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Figure  11:  Pair  from  real  image  sequence 

neglected.  Therefore  the  horizontal  component  can  be  displayed  the  same  way  as 
in  the  example  before.  Obviously  a  model  of  regionally  constant  flow  will  not  be 
able  to  capture  the  field  of  the  table  motion.  The  necessary  extension  is  to  use 
a  planar  model  for  the  displacement  as  described  in  Appendix  C.  The  result  of 
Fig.  12,  which  is  based  on  the  regionally  constant  model,  shows  clearly,  however, 
the  discontinous  motion  of  the  can  in  comparison  to  the  background,  except  at  the 
top  right  corner  of  the  can,  where  the  contrast  between  object  and  background  is 
so  low,  that  a  good  estimate  is  just  not  possible. 


5  Conclusions  and  discussion 

A  new  approach  has  been  presented  for  the  fast  determination  of  discontinuos 
displacement  vector  fields,  which  is  closely  related  to  the  problem  of  finding  a 
segmentation  of  the  vector  field.  The  last  step  of  the  actual  segmentation  has 
not  yet  been  taken,  this  will  be  a  natural  follow-up  project  with  its  own  specific 
problems. 

The  approach  is  based  on  regional  models  of  the  vector  field,  and  it  uses  gray 
level  data  as  primary  input,  either  the  original  gray  value  images  or  appropriately 
filtered  images. 

The  boundaries  and  the  discontinous  displacement  vector  field  can  be  found 
within  a  single  step,  no  iterations  of  any  procedure  are  required.  The  elegance 
of  the  scheme  is  that  boundaries  are  found  instantaneously  by  demanding  consis¬ 
tency  of  local  model  compatibility  measures  within  a  neighborhood  and  at  multiple 
scales.  In  that  way  the  “continuity  of  discontinuities”  and  the  “spatial  coincidence 
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Figure  12:  Resulting  estimate  of  ID  displacement  field 
constraint”  are  made  practical. 

The  current  constraint  on  the  boundary  is  simply  its  smoothness,  This  is  sim¬ 
ilar  to  the  chain-code  boundary  model  as  it  is  common  in  Markov  Random  Field 
models.  Obviously  there  are  more  efficient  boundary  models  such  as  polygons, 
quadratic  curves  or  Bezier  curves.  Those  will  most  probably  result  in  a  shorter 
code  and  will  therefore  allow  more  easily  the  segmentation  of  extremely  thin  ob¬ 
jects,  which  will  hardly  be  found  by  the  current  model. 

At  this  point  only  very  simple  models  have  actually  been  tried.  Although  the 
description  length  criterion  allows  the  comparison  of  different  models,  the  current 
implementation  will  need  to  be  extended  to  allow  this  comparison.  In  particular 
the  planar  and  the  second  order  models  will  have  to  be  tested  with  real  stereo  and 
motion  data.  One  aspect  to  consider  is  the  increased  storage  requirement,  when  a 
more  complex  model  is  applied. 

The  method  is  not  limited  to  segmentation  based  on  displacement  vector  fields, 
but  can  be  applied  with  any  model  that  can  be  set  up  as  a  regional  least  squares 
problem.  Therefore  the  same  method  also  might  be  used,  for  example,  for  texture 
based  segmentation. 

The  fact  that  the  information  about  discontinuities  is  represented  as  a  probabil¬ 
ity  field  allows  a  relatively  easy  integration  of  different  vision  modules.  The  same 
way  as  the  probabilities  from  different  scales,  those  from  different  vision  modules 
can  be  integrated.  There  is  no  inherent  problem  of  coupling  constants,  because 
the  probabilities  are  all  normalized. 
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Appendix 


A  Estimation  of  locally  constant  ID  flow 

The  idea  behind  this  model  is  that  all  intensity  differences  between  image  I^\x) 
and  image  x )  are  caused  by  the  constant  displacement  u.  In  order  to  minimize 
the  approximation  error,  both  images  are  treated  in  a  symmetric  way.  Therefore 
the  following  integral  has  to  be  minimized  with  respect  to  u : 

J  w(x)  (l(1){x  -  |)  -  I(2){x  +  |))  dx  (32) 

The  weighting  function  w(x)  defines  the  support  of  the  functional  and  the  weights 
of  the  individual  measurements  within  this  support.  By  a  local  linear  expansion 
the  /<*>(*)  are  approximated: 

/(fc)( x  +  u)  as  I{0k\x)  +  u  •  4fc)(x)  (33) 

Now  it  is  possible  to  restrict  x  to  the  grid  coordinates  i.  With 

k  =  4‘’(‘)  -  42,« 


and 

*  =  5  (4l,(i)  +  4”M) 

the  integral  to  be  minimized  is  transformed  into  a  sum  over  the  discrete  support 
R  and  the  weights  Wi,  the  sum  of  squared  deviations  Sr: 

Sr  =  ^lwi{hi  -  u- gi)2  (34) 

itR 

This  has  the  least  squares  solution 

•  EmA  wihi9i 

U  =  — — - r- 

£;«*  ™i9i 

and  the  minimal  sum  of  squared  deviations  Sr 


S 


R 


53  Wih2 

>£R 


53  wtf 

i£R 


(Ei€*  »<*«*)’ 

Ei€R  U>i9i 

(36) 

(37) 
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B  Estimation  of  locally  constant  2D  flow 


The  derivation  is  very  similar  to  the  ID  case,  with  the  difference  that  now  the 
displacement  has  the  two  components  u  and  v  and  the  gray  value  gradients  have 
the  components  gxi  and  g^.  The  weighted  sum  of  squared  deviations  becomes 


Sr  =  X!  wi  (hi  ~  u  •  9xi  ~  v  •  g*)2 

itR 


with  the  least  squares  solution 


/  «*  \  _  (  E,  nigh  T.iWigxigyi  \  /  ZiWihigxi  \ 

\  v*  )  \  ZiWig^gyi  Zi^igli  )  \  HiWihigyi  ) 

and  the  minimal  sum  of  squared  deviations 


=  ?-(«* 
i 


E,  Wig2* 
Zi  Wigxi9yi 


HiWig^g^  \  /  u* 

Zi  Wig*  )  \v 


(38) 

(39) 

(40) 


C  Estimation  of  planar  ID  flow  in  a  2D  image 

This  is  the  planar  surface  model  relevant  for  stereo,  when  the  images  are  given  in 
normal  coordinates  and  the  epipolar  line  is  parallel  to  the  x-axis.  The  displacement 
model  now  depends  on  the  image  coordinates  (*i,J/i) 

«(*.»&)  =  +  axi  +  byi  (41) 

The  weighted  sum  of  squared  deviations  Sr  becomes 

Sr  =  YL  wi  “  («o  +  ax,  +  byi)gi)2  (42) 

t 

and  the  least  squares  estimates  are 

/  «o\  (  E,  Wig2  E«  wi9ixi  Zi  Wight  \  /  Z^ih^  \ 

a  =  Zi  w^Xi  Zi  Wigh2  E,  w^XiVi  Zi  (43) 

\  b  /  \  Zi  Wighi  Zi  Wigfxiyi  ZiWigfy2  /  \  ZiWih^ y{  ) 

The  minimal  sum  of  squared  deviations  is  analogous  to  the  previous  cases. 


D  Stochastic  Complexity  and  regression 

Unfortunately  the  criterion  described  in  equation  9  is  not  invariant  with  respect 
to  scale  change  of  the  the  data.  If  both  the  observations  as  well  as  the  regressor 
variables  are  multiplied  with  a  factor  s,  then  the  additional  term 

(k  +  n)  log2  s 
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appears,  although  the  parameters  are  identical  in  both  cases.  This  can  influence 
the  decision,  which  set  of  regressor  variables  is  optimal,  which  is  certainly  not 
desirable. 

J.  Rissanen  has  introduced  the  concept  of  “Stochastic  Complexity”,  which  elim¬ 
inates  all  possible  redundancy  of  a  coding  system  [Ris87,  Ris89].  For  a  model  class 
Mk  it  is  defined  as 

I(x\Mk)  =  -  log  J  P(x\$)dx(0)  (44) 

Essentially  the  difference  to  the  minimal  description  length  (MDL)  is  the  inte¬ 
gration  over  all  possible  “priors”  or  models  instead  of  chosing  the  one,  where  the 
product  under  the  integral  is  minimal.  As  the  integral  is  larger  than  the  maxi¬ 
mum  of  its  elements,  naturally  I(x\Mk)  is  smaller  than  the  minimal  description 
length.  Although  asymptotically  the  Stochastic  Complexity  is  identical  with  the 
criteria  above,  it  is  optimal  for  data  sets  of  any  size  and  has  the  required  invariance 
properties. 

The  problem  is,  that  the  Stochastic  Complexity  can’t  be  calculated  analytically 
in  many  cases.  Fortunately  there  is  a  closed  form  solution  for  the  regression  prob¬ 
lem.  It  requires  the  parametrization  of  the  the  prior  distribution  for  the  variance 
cr  in  the  form  of  the  conjugate  priors 

=  *-*(§)  (45) 

and  for  the  parameters  as  a  normal  distribution  ir(Oi)  with  zero  mean  and  variance 
-,  with  a  and  c  as  nuisance  parameters.  The  Stochastic  Complexity  is  optimized 
with  respect  to  the  nuisance  parameters.  The  derivation  is  described  in  [Ris89, 
page  126-130],  here  just  the  result  is  reported.  The  design  matrix  £  =  GTG  is 
extended  to  £c  =  cl  +  GTG,  which  prevents  it  from  becoming  singular.  Then  the 
minimum  sum  of  squared  deviations  also  depends  on  c: 

Sc  =  xTx  -  xtG{cI  +  GtG)~1Gtx  (46) 


The  complete  Stochastic  Complexity  is 

Hx\g,c)  =  £  log2  Sc  +  -  log 2 1 Ec | 


fc,  ,  n+1,  7r(n  +  l)  1,  ^n  +  l^ 

"2  l0g!  C  +  — l0g!  — —  -  2  l0gj  r(~) 


(47) 


The  minimization  with  respect  to  c  has  to  be  done  numerically  with  the  itera¬ 
tion: 

Sc 

C  =  SctrSc  +  (n  +  \)xTGTl~2GTx  (48) 

It  has  been  proven  [Ris90],  that  the  Stochastic  Complexity  found  this  way  is  in¬ 
variant  with  respect  to  scale  changes  of  regressor  variables  and  observations. 
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